Multiversion Concurrency Control (MVCC)
Introduction
Multiversion concurrency control (MVCC Multiversion Concurrency Control. This is a database optimization technique that creates duplicate copies of records so that data can be safely read and updated at the same time.) is method that allows a user to have a concurrent and persistent view of distributed transactions across partitions. GigaSpaces keep multiple versions of modified entries to ensure that a user has a persistent view of the data that is consistent with the SoR.
Processing a large number of simultaneous transactions in Smart DIH Smart DIH allows enterprises to develop and deploy digital services in an agile manner, without disturbing core business applications. This is achieved by creating an event-driven, highly performing, efficient and available replica of the data from multiple systems and applications, requires an extreme write throughput that cannot be paused. In order to maintain transactions in the platform, Space Where GigaSpaces data is stored. It is the logical cache that holds data objects in memory and might also hold them in layered in tiering. Data is hosted from multiple SoRs, consolidated as a unified data model. objects must not be locked. The MVCC mechanism provides an efficient solution, allowing massive updates while maintaining consistency in the Space with the systems of record (SoR). In this manner, the ACID In the context of databases and data storage systems, a transaction is any operation that is treated as a single unit of work, which either completes fully or does not complete at all, and leaves the storage system in a consistent state. ACID is an acronym that refers to the set of 4 key properties that define a transaction: Atomicity, Consistency, Isolation, and Durability. If a database operation has these ACID properties, it can be called an ACID transaction. properties of transactions are maintained, ensuring the consistency and integrity of the data before and after each update, even in highly available distributed systems.
For more information about the MVCC mechanism and how it is used in Smart DIH Digital Integration Hub. An application architecture that decouples digital applications from the systems of record, and aggregates operational data into a low-latency data fabric., read our blog on How to Achieve ACID Compliance on Distributed, Highly Available Systems (search for MVCC).
MVCC Flow
The diagram below shows the process of how an update that is coming from the SoR to a CDC Change Data Capture. A technology that identifies and captures changes made to data in a database, enabling real-time data integration and synchronization between systems. Primarily used for data that is frequently updated, such as user transactions. stream travels through the DI The Data Integration (DI) layer is a vital part of the Digital Integration Hub (DIH) platform. It is responsible for a wide range of data integration tasks such as ingesting data in batches or streaming data changes. This is performed in real-time from various sources and systems of record (SOR. The data then resides in the In-Memory Data Grid (IMDG), or Space, of the GigaSpaces Smart DIH platform. Layer and is finally updated in the Space.
-
In the Space, only the area written in pink would be visible to the user.
-
All the newer updates (in blue) would be occurring on top of the data and is not visible to the user.
-
When the update is applied (by the DI) then that data will be visible to the end user. And so the MVCC update cycle continues.
MVCC Configuration Properties
Name | Type | Default Value | Description |
---|---|---|---|
space-config.mvcc.enabled | Boolean | false | MVCC is enabled for the Space |
space-config.mvcc.space-config.mvcc.historical_entry_lifetime | Integer | 5 | Time limit for holding entry version in the cache. Main measure for “should particular entry version be cleaned or not“ |
space-config.mvcc.historical_entry_lifetime_timeunit | TimeUnit | m | Measure of time limit (millis(ms), seconds(s), minutes(m)…) |
space-config.mvcc.historical_entries_limit | Integer | 5 | Max allowed limit for historical entries number per UID. CANNOT BE 0. Data lifetime take precedence over this criteria. (if number in cache < limit, but some entries are too old - purge them) |
space-config.mvcc.fixed_cleanup_delay_millis | Integer | 1000000 | Timeout between cleanup iterations. To enable dynamic delay based on previous cleanups set to 0. |
The configuration settings for MVCC can be modified to tweak the impact on memory consumption.
Configuring a Space for MVCC
MVCC cannot be configured for a Space that is already Active. To enable MVCC a new Space has to be created.
To enable a Space for MVCC, perform the following steps:
-
Add a new Space by following steps as outlined in the User Guide: SpaceDeck - Spaces - Adding a Space
-
In the Adding a New Space Parameters section, to enable MVCC add the following Context Properties/Property Name: space-config.mvcc.enabled=true
-
To change any of the other default parameters, additional Properties Names should to be added.
-
Once completed, click Create Space.
Querying an MVCC Enabled Space
-
The MVCC enabled Space can be queried using the JDBCv3 compliant RESTful REpresentational State Transfer. Application Programming Interface An API, or application programming interface, is a set of rules that define how applications or devices can connect to and communicate with each other. A REST API is an API that conforms to the design principles of the REST, or representational state transfer architectural style. Services or using the Postgres SQL compliant data-gateway. It is recommended for a persistent view, that queries should be part of an explicit transaction. This is because the consistently of the data for non-transactional queries cannot be guaranteed.
-
For developers, The MVCC enabled Space can also be queried using the Java proxy APIs - limited to basic APIs such as single operations by ID (write, read, take, update), read-multiple operations, read/take with template matching. Developers can also utilize the SQL JDBCv3 driver or SQLQuery Java API.
-
MVCC Space operations are required to be transactional. This is to ensure that committed data is consistently reflected when fetched. A read operation without transaction is only allowed when specifying the
READ_COMMITTED
isolation level modifier. Transactional reads can be performed with isolation levels such as:DIRTY_READ
,REPEATABLE_READ
, or the defaultREAD_COMMITTED
modifier for MVCC.
Limitations - Partial Support
-
When using an MVCC Space, there is no support for the following Space operators:
-
readIfExists, readIfExistsById, asyncRead
-
takeMultiple, takeByIds, takeIfExists, takeIfExistsById, asyncTake
-
writeMultiple
-
change, asyncChange
-
count, clear, aggregate, execute, iterator, dropClass, executorBuilder, asyncLoad
-
-
MVCC is limited to ALL IN CACHE configuration.
-
There is no support for other cache policies such as LRU Last Recently Used. This is a common caching strategy. It defines the policy to evict elements from the cache to make room for new elements when the cache is full, meaning it discards the least recently used items first., Tiered-Storage and cache topologies such as: Local Cache/View.
-
Secondary unique index will not be allowed.
-
More than one data pipeline per table is not supported. Each object type in the Space must be populated through one specific pipeline.
Performance Impact
-
The number of transactions (throughput) is decreased by 5-7%.
-
MVCC adds on average a 25% RAM overhead.